This mini project will review marathon result dataset in different countries in 2017,with some main variables such as gender,age,overal ranking,city,country which I worked on them mainly also some other variable to be used created for this report such as finishing time. Here I listed some main steps to start working on this dataset:
Step 1: Using the function read_csv() to read and variables are assigned for further use.
Step 2: Checking for missed data.
Give an overview on data by head command:
head(marathon_2017)
## Bib Name Age M.F City State Country X5K X10K
## 1 11 Kirui, Geoffrey 24 M Keringet KEN 0:15:25 0:30:28
## 2 17 Rupp, Galen 30 M Portland OR USA 0:15:24 0:30:27
## 3 23 Osako, Suguru 25 M Machida-City JPN 0:15:25 0:30:29
## 4 21 Biwott, Shadrack 32 M Mammoth Lakes CA USA 0:15:25 0:30:29
## 5 9 Chebet, Wilson 31 M Marakwet KEN 0:15:25 0:30:28
## 6 15 Abdirahman, Abdi 40 M Phoenix AZ USA 0:15:25 0:30:28
## X15K X20K Half X25K X30K X35K X40K Pace Proj.Time
## 1 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19 2:02:53 0:04:57 -
## 2 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19 2:03:14 0:04:58 -
## 3 0:45:44 1:01:16 1:04:36 1:17:00 1:33:01 1:48:31 2:03:38 0:04:59 -
## 4 0:45:44 1:01:19 1:04:45 1:17:00 1:33:01 1:48:58 2:04:35 0:05:03 -
## 5 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:41 2:05:00 0:05:04 -
## 6 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:52 2:05:21 0:05:04 -
## Official.Time Overall Gender Division
## 1 2:09:37 1 1 1
## 2 2:09:58 2 2 2
## 3 2:10:28 3 3 3
## 4 2:12:08 4 4 4
## 5 2:12:35 5 5 5
## 6 2:12:45 6 6 1
## Bib Name Age M.F City
## 0 0 0 0 0
## State Country X5K X10K X15K
## 0 0 0 0 0
## X20K Half X25K X30K X35K
## 0 0 0 0 0
## X40K Pace Proj.Time Official.Time Overall
## 0 0 0 0 0
## Gender Division
## 0 0
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
!colSums(is.na(marathon_2017))
## Bib Name Age M.F City
## TRUE TRUE TRUE TRUE TRUE
## State Country X5K X10K X15K
## TRUE TRUE TRUE TRUE TRUE
## X20K Half X25K X30K X35K
## TRUE TRUE TRUE TRUE TRUE
## X40K Pace Proj.Time Official.Time Overall
## TRUE TRUE TRUE TRUE TRUE
## Gender Division
## TRUE TRUE
To have a demoghraphy from runners I applied summary comand:
by(marathon_2017$Age, marathon_2017$M.F, summary)
## marathon_2017$M.F: F
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 31.00 40.00 39.95 48.00 84.00
## ------------------------------------------------------------
## marathon_2017$M.F: M
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 36.00 45.00 44.77 53.00 83.00
The youngest female finishers were 18 years old, and oldest female ones were 84, also youngest male finishers were 18 years old and oldest male runners were 83 years old.
library(ggplot2)
library(ggthemes)
ggplot(aes(x = Age, fill =M.F), data = marathon_2017) +
geom_histogram(position = position_dodge()) +
scale_x_continuous(breaks = round(seq(min(marathon_2017$Age), max(marathon_2017$Age),by = 5),10)) +
scale_colour_wsj() + theme_wsj()+
labs(title='Age and Gender of Runners')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
It seemes women are more eager to marathon when they are younger,specially when they are 25 and again when they are around 40,but as they becoe older,they lose eagerness to run,but in reverse men become interested to marathon slow by slow when they become older,specifically when they become 40,45,50 they love to join marathon races.after turning to 50 always more number of men are interested to run compare to women.
library(ggthemes)
ggplot(aes(x=Gender, y=Age, fill=M.F), data = marathon_2017) +
geom_boxplot() +
theme_economist()+ scale_color_economist() +theme(plot.title = element_text(size = 20, face = "bold", color = "tomato", hjust = 0.5))+
labs(title=' Boxplot of Runners by Age and Gender')
Intersting point in this plot is Female runnersare younger than male runners in first quartile, median and third quartile.Median for women is at their 40 but for men is at their 45.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ✓ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
marathon_2017 %>%
mutate(Official_Time=lubridate::hms(Official.Time)) %>%
mutate(New_time = lubridate::period_to_seconds(Official_Time)) %>%
group_by(M.F) %>%
summarize(Ave_Time=mean(New_time, na.rm=T),Ave_Age=mean(Age,na.rm=T),.group="drop")
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 4
## M.F Ave_Time Ave_Age .group
## <fct> <dbl> <dbl> <chr>
## 1 F 14945. 40.0 drop
## 2 M 13735. 44.8 drop
It could be seen Average time of finishing the race for women is more than men,also the women who end the race are younger compare to men.
New variable is defined as New_time to have finishing time of all the runners in seconds for easier analysis:
New_marathon <- marathon_2017 %>%
mutate(official_time = lubridate::hms(Official.Time)) %>%
mutate(New_time = lubridate::period_to_seconds(official_time)) %>%
select(Age, M.F, official_time, New_time, everything())
New_marathon %>%
group_by(M.F, Age) %>%
summarize(ave_time = mean(New_time, na.rm = T),
ave_age = mean(Age, na.rm = T),
.groups = "drop")
## # A tibble: 125 x 4
## M.F Age ave_time ave_age
## <fct> <int> <dbl> <dbl>
## 1 F 18 17288. 18
## 2 F 19 15924. 19
## 3 F 20 15859. 20
## 4 F 21 14660. 21
## 5 F 22 15283. 22
## 6 F 23 14856. 23
## 7 F 24 14909. 24
## 8 F 25 14600. 25
## 9 F 26 14351. 26
## 10 F 27 14818. 27
## # … with 115 more rows
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
New_marathon %>%
group_by(M.F, Age) %>%
summarize(ave_time = mean(New_time, na.rm = T),
ave_age = mean(Age, na.rm = T),.groups = "drop") %>%
ggplot() +
geom_point(aes(x = Age, y = ave_time, color = M.F))+labs(title="Trend of Average Finishing Time VS. Age and Gender")+theme(legend.position = "bottom")+theme(plot.title = element_text(size = 16, face = "bold",color = "tomato"))
ggplotly()
This plot shows that if we like or dislike , ageis really important! As you see when people are between 20 to 40 years old average finishing time is much less compare to when they become older. Second interesting point is that usually in all periods of life men finish a marathon race sooner than women.Specially when they are so old (after 70 years old) men run more faster than women.
New_marathon %>%
group_by(M.F, Age) %>%
summarize(ave_time = mean(New_time, na.rm = T),
ave_age = mean(Age, na.rm = T),
.groups = "drop") %>%
ggplot() +
geom_density(aes(x = ave_time,color = M.F))+labs(title=" Average Finishing Time of Runners")+theme(plot.title = element_text(size = 20, face = "bold",color = "tomato"))
ggplotly()
Again we see there is a gender gap between male and female. on average male run faster than female. This gap is greater among the runners who finish sooner than among those who finish last. The peak finish time for male crossed finished line was in 13237 sec and 14825 sec for female.
New_marathon <- marathon_2017 %>%
mutate(official_time = lubridate::hms(Official.Time)) %>%
mutate(New_time = lubridate::period_to_seconds(official_time)) %>%
select(Age, M.F, official_time, New_time, everything())
Runners<-New_marathon %>%
filter(Country=="JPN"|Country=="CAN"|Country=="ITA")
view(Runners)
With above commands we filter only 3 countries of Japan,Canada,Italy to analyse them in more detailes.
ggplot(data=Runners,mapping=aes(x=New_time,fill=M.F))+
geom_histogram(color = "darkorchid4")+labs(title="Gender distrubution in Different finishing Times for Specific 3 countries",subtitle="Japan,Italy,Canada")+scale_color_brewer(palette = "Pastel2")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplotly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
For only 3 selected country from 3 different continent,I checked gender distrubition in different finishing times,most of the men run in 12800 seconds while most of the women run in 13400 seconds.
#Filter fast runners who finish game in less than *9000 seconds*
Runners<-New_marathon %>%
filter(New_time<9000)
view(Runners)
library(ggrepel)
ggplot(data=Runners,mapping=aes(x=New_time,y=Age,color=M.F)) +
geom_point()+geom_line()+
xlab('Finish Time(sec)') +scale_color_fivethirtyeight()+
labs(title='Fast Runners Finishing Time')+geom_label_repel(aes(label = Overall))+theme(plot.title = element_text(size = 20, face = "bold",color = "tomato"))
This plot shows overall rank of a runner with its gender in finishing time of less than 9000 seconds. First one is a male who finishes in 7750 seconds and the first fast female runner finishes at 8600* seconds and her rank is 21th.
filter(New_marathon, M.F == "M",Country=="USA") %>%
mutate(Half_time =lubridate::hms(X10K)) %>%
mutate(New_Halftime =lubridate::period_to_seconds(Half_time)) %>%
mutate(official_time = lubridate::hms(Official.Time)) %>%
mutate(New_time = lubridate::period_to_seconds(official_time)) %>%
ggplot(aes(x=New_time, y=New_Halftime, rm.na=T)) +
geom_point(aes(colour = Country)) +
geom_smooth(method='lm')+labs(title="Correlation of Halfway and Finishing Time",subtitle="USA")+coord_cartesian(xlim = c(7000, 25000),ylim = c (2000, 6000) )+
annotate(geom = "text",x = 15000, y = 4000,label = "Highly Correlated!")+theme(plot.title = element_text(size = 20, face = "bold",color = "tomato"))+xlab('Finish Time(sec)')+ylab("halftime(sec)")
## Warning: Problem with `mutate()` input `Half_time`.
## ℹ Some strings failed to parse, or all strings are NAs
## ℹ Input `Half_time` is `lubridate::hms(X10K)`.
## Warning in .parse_hms(..., order = "HMS", quiet = quiet): Some strings failed to
## parse, or all strings are NAs
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 25 rows containing non-finite values (stat_smooth).
## Warning: Removed 25 rows containing missing values (geom_point).
There is a correlation between finishing time and half time finishing in USA.
New_marathon1<-New_marathon %>%
mutate(Half_time =lubridate::hms(X10K)) %>%
mutate(New_Halftime =lubridate::period_to_seconds(Half_time)) %>%
mutate(official_time = lubridate::hms(Official.Time)) %>%
mutate(New_time = lubridate::period_to_seconds(official_time))
## Warning: Problem with `mutate()` input `Half_time`.
## ℹ Some strings failed to parse, or all strings are NAs
## ℹ Input `Half_time` is `lubridate::hms(X10K)`.
## Warning in .parse_hms(..., order = "HMS", quiet = quiet): Some strings failed to
## parse, or all strings are NAs
head(New_marathon1)
## Age M.F official_time New_time Bib Name City State
## 1 24 M 2H 9M 37S 7777 11 Kirui, Geoffrey Keringet
## 2 30 M 2H 9M 58S 7798 17 Rupp, Galen Portland OR
## 3 25 M 2H 10M 28S 7828 23 Osako, Suguru Machida-City
## 4 32 M 2H 12M 8S 7928 21 Biwott, Shadrack Mammoth Lakes CA
## 5 31 M 2H 12M 35S 7955 9 Chebet, Wilson Marakwet
## 6 40 M 2H 12M 45S 7965 15 Abdirahman, Abdi Phoenix AZ
## Country X5K X10K X15K X20K Half X25K X30K X35K
## 1 KEN 0:15:25 0:30:28 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19
## 2 USA 0:15:24 0:30:27 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19
## 3 JPN 0:15:25 0:30:29 0:45:44 1:01:16 1:04:36 1:17:00 1:33:01 1:48:31
## 4 USA 0:15:25 0:30:29 0:45:44 1:01:19 1:04:45 1:17:00 1:33:01 1:48:58
## 5 KEN 0:15:25 0:30:28 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:41
## 6 USA 0:15:25 0:30:28 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:52
## X40K Pace Proj.Time Official.Time Overall Gender Division Half_time
## 1 2:02:53 0:04:57 - 2:09:37 1 1 1 30M 28S
## 2 2:03:14 0:04:58 - 2:09:58 2 2 2 30M 27S
## 3 2:03:38 0:04:59 - 2:10:28 3 3 3 30M 29S
## 4 2:04:35 0:05:03 - 2:12:08 4 4 4 30M 29S
## 5 2:05:00 0:05:04 - 2:12:35 5 5 5 30M 28S
## 6 2:05:21 0:05:04 - 2:12:45 6 6 1 30M 28S
## New_Halftime
## 1 1828
## 2 1827
## 3 1829
## 4 1829
## 5 1828
## 6 1828
There is a high correlation of 0.903 between finishing time and halfway time in USA for maraton runners.As hifhway time increases so does finishing time.
library(ggrepel)
filter(New_marathon,Overall<=20)%>%
mutate(official_time = lubridate::hms(Official.Time)) %>%
mutate(New_time = lubridate::period_to_seconds(official_time)) %>%
ggplot(fill="red")+theme(plot.title = element_text(size = 20, face = "bold",color = "tomato"))+
geom_col(aes(x = Overall,y= New_time,color="red"))+labs(title="Finishing Time of 20 First Runners")+xlab("Overall Rank")+ylab("Finishing Time")
ggplotly()
Filtering the first 20 runners,then plot overall ranks of them versus finishing time.First runners finish it in 7777 seconds,second one in 20 seconds later. The last one in 8418 seconds,so they really finish the race in a very close range.
New_marathon %>%
filter(Age<=25) %>%
ggplot(aes(x=Country,colour = Country,rm.na=T)) +
geom_bar(fill="pink") +
labs(title="Countries with youngest runners",xaxis ="Number of runners", yaxis = "Country")+theme(axis.text.x =element_text(angle=65,vjust=0.6))+coord_flip()+theme_void()
ggplotly()
Above plot shows countries which have youngest runnersunder 25 years old,as it is obvious USA with 1666 young runners has the most youngest runners among other countries. with a huge difference canada is the second country with 43 young runners under 25 years old.
#spatial visualization:
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
world <- ne_countries(scale = "medium", returnclass = "sf")
Runner_map <- read.csv("https://raw.githubusercontent.com/reisanar/datasets/master/marathon_results_2017.csv")
# Rename country code column to ISO_A3 so it matches what's in the Natural Earth shapefile
Runner_map <- rename(Runner_map, iso_a3 = Country)
new_map <- world %>%
left_join(Runner_map, by = "iso_a3")
participants <- Runner_map %>%
group_by(iso_a3) %>%
count()
participants %>%
arrange(desc(n))
## # A tibble: 91 x 2
## # Groups: iso_a3 [91]
## iso_a3 n
## <fct> <int>
## 1 USA 20945
## 2 CAN 1870
## 3 GBR 425
## 4 MEX 285
## 5 CHN 242
## 6 GER 226
## 7 BRA 205
## 8 AUS 191
## 9 JPN 170
## 10 ITA 165
## # … with 81 more rows
marathon <- world %>%
left_join(participants, by = "iso_a3")
ggplot(data = marathon) +
geom_sf(aes(fill = n), color = "grey") +
scale_fill_gradient(low = "#bfafd4",
high = "#857a94",
na.value = "#f8f7fa") +
xlab("Longitude") + ylab("Latitude") +
labs(
title = "Runner Population in Every Country",
barwidth = 15,
barheight = 0.5,
fill = NULL
) + theme_void()+theme(plot.title = element_text(size = 18, face = "bold",color = "tomato",hjust=0.5))
ggplotly()
Age_Rank <- Runner_map %>%
filter(Age<= 40,Overall<=100)
gender_map <- world %>%
left_join(Age_Rank, by = "iso_a3")
Young_Ranking <- Age_Rank %>%
group_by(iso_a3) %>%
count()
Young_Ranking %>%
arrange(desc(n))
## # A tibble: 10 x 2
## # Groups: iso_a3 [10]
## iso_a3 n
## <fct> <int>
## 1 USA 66
## 2 KEN 8
## 3 CAN 4
## 4 ETH 4
## 5 JPN 3
## 6 BDI 1
## 7 BRN 1
## 8 GBR 1
## 9 MEX 1
## 10 ZIM 1
marathon <- world %>%
left_join(Young_Ranking, by = "iso_a3")
ggplot(data = marathon) +
geom_sf(aes(fill = n), color = "grey") +
scale_fill_gradient(low = "#bfafd4",
high = "#857a94",
na.value = "#f8f7fa") +
xlab("Longitude") + ylab("Latitude") +
labs(
title = "Young Faster Runners Population in Every Country",
barwidth = 15,
barheight = 0.5,
fill = NULL
) + theme_void()+theme(plot.title = element_text(size = 16, face = "bold",color = "tomato",hjust=0.5))
ggplotly()
Gender_Rank <- Runner_map %>%
filter(M.F== "M")
gender_map <- world %>%
left_join(Gender_Rank, by = "iso_a3")
Gender_Ranking <- Gender_Rank %>%
group_by(iso_a3) %>%
count()
Gender_Ranking %>%
arrange(desc(n))
## # A tibble: 84 x 2
## # Groups: iso_a3 [84]
## iso_a3 n
## <fct> <int>
## 1 USA 10788
## 2 CAN 1055
## 3 GBR 290
## 4 MEX 190
## 5 CHN 165
## 6 GER 163
## 7 BRA 146
## 8 JPN 129
## 9 ITA 128
## 10 AUS 120
## # … with 74 more rows
marathon <- world %>%
left_join(Gender_Ranking, by = "iso_a3")
ggplot(data = marathon) +
geom_sf(aes(fill = n), color = "grey") +
scale_fill_gradient(low = "#bfafd4",
high = "#857a94",
na.value = "#f8f7fa") +
xlab("Longitude") + ylab("Latitude") +
labs(title = "Male Runners Population in every country",barwidth = 15,barheight = 0.5,fill = NULL)+
theme_void()+theme(plot.title = element_text(size = 16, face = "bold",color = "tomato",hjust=0.5))
ggplotly()
Gender_Rank <- Runner_map %>%
filter(M.F== "F",Age>=50)
gender_map <- world %>%
left_join(Gender_Rank, by = "iso_a3")
Gender_Ranking <- Gender_Rank %>%
group_by(iso_a3) %>%
count()
Gender_Ranking %>%
arrange(desc(n))
## # A tibble: 45 x 2
## # Groups: iso_a3 [45]
## iso_a3 n
## <fct> <int>
## 1 USA 1887
## 2 CAN 264
## 3 GBR 31
## 4 GER 28
## 5 JPN 25
## 6 ITA 19
## 7 CHN 16
## 8 AUS 14
## 9 MEX 14
## 10 BRA 9
## # … with 35 more rows
marathon <- world %>%
left_join(Gender_Ranking, by = "iso_a3")
ggplot(data = marathon) +
geom_sf(aes(fill = n), color = "grey") +
scale_fill_gradient(low = "#bfafd4",
high = "#857a94",
na.value = "#f8f7fa") +
xlab("Longitude") + ylab("Latitude") +
labs(title = "Which Countries have Older Female Runners Population?",barwidth = 15,barheight = 0.5,fill = NULL)+
theme_void()+theme(plot.title = element_text(size = 15, face = "bold",color = "tomato",hjust=0.5))
ggplotly()
I chose this dataset because when I was studying my bachelor I joind marathon group of my previous university and I had a memorable time those days.So his dataset could decode some new aspects of this sport for me in other countries. I started ploting some basic plot at first to have a general opinion about the whole topic,then shifted to some interactive plot and finally some spatial plot.bellow you can find some interesting issues which I found from these plot:
The youngest female finishers were 18 years old, and oldest female ones were 84 years old.
youngest male finishers were 18 years old and oldest male runners were 83 years old.
Women are more eager to marathon when they are younger,specially when they are 25 and again when they are around 40,but as they becoe older,they lose eagerness to run.
Men become interested to marathon slow by slow when they become older,specifically when they become 40,45,50 they love to join marathon races.after turning to 50 always more number of men are interested to run compare to women.
Female runnersare younger than male runners in first quartile, median and third quartile.Median for women is at their 40 but for men is at their 45.
Average time of finishing the race for women is more than men,also the women who end the race are younger compare to men.
Age is really important!as you see when people are between 20 to 40 years old average finishing time is much less compare to when they become older.
Men finish marathon race sooner than women.Specially when they are so old (after 70 years old) men run more faster than women.
There is a gender gap between male and female. On average male run faster than female. This gap is greater among the runners who finish sooner than among those who finish last.
The peak finish time for male crossed finished line was in 13237 sec and 14825 sec for female.
For only 3 selected country (Japan,Canada,Italy) from 3 different continent,gender distrubition in different finishing times,most of the men run in 12800 seconds while most of the women run in 13400 seconds.
For finishing time of less than 9000 seconds,First one is a male who finishes in 7750 seconds and the first fast female runner finishes at 8600 seconds and her rank is 21th.
There is a high correlation of 0.903 between finishing time and halfway time in USA for maraton runners.As halfhway time increases so does finishing time.
Filtering the first 20 runners,first runners finish it in 7777 seconds,second one in 20 seconds later. The last one in 8418 seconds,so they really finish the race in a very close range.
In countries which have youngest runnersunder 25 years old,as it is obvious USA with 1666 young runners has the most youngest runners among other countries. with a huge difference canada is the second country with 43 young runners under 25 years old.
Calculating the number of participants in every country (using count function) shows that USA has the first rank of participants with 20945 runners with huge difference even with the second country which Canada with 1870. Some countries only had 1 participant such as Falkland Islands,Bulgaria and Jamaica.
Then Young Faster Runners Population in every country was checked and again USA with 66 runners under 40 was the first ranked country.
After filltering the male runners to check the gender balance of this race,it shows that for countries such as USA and Canada there is a fair gender balance.
Finally I was really curious to know how is the trend of old women(more than 50s) in participating in marathon sport,and it was intersting to see this distrubution in different countries,again USA with 1887 female participants has the first ranking of this category.
Different type of plots were created including:histogram,scatterplot,coloumn bar,density,box plot to visualize different aspects of this dataset.
Gestalt Principles tried to be considered specially items such as proximity,similarity,continuty and closure.
The Big Four: CRAP
Contrast*: make different by different fonts and color.
Repetition*:Repeat visual elements (colors, fonts, shapes) to create a strong unified theme.
Alignment*: try to have a visual connection among items in the plots
Proximity*: group related items
Using some functions such as theme() function with installing ggthemes package to create different type of theme for every plot.
Customizing the look with coord_cartesian and defining x and y limit for our plot.Modify tilte location with hjust,flipping the axis with coord_flip() and geom_label_repel to add label to plot.Finally I used annotation to add specific point of the chart when was useful.
Using some different color pallet to emphasize on important sections and results.
I tried to use bright or dark colors to highlight information that requires greater attention.in this dataset there was a unordered categorical (gender) required distinct colors that will not be easily confused with one another.
Moreover, whenever I needed a special color a Hexadecimal color was used.
Drawing spetial plot was challenging for me at first and I got many guidance from Dr.Sanchez,but after being able to a variety of plots their interpretation was interesting for me, also as abbraviation of countries names were avialable,package of rnaturalearth was used and by using function of leftjoin dataframe of marathon_2017 and datashape of world were joined to eachother.